-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Refactoring doc values sparse index enabling for the host.name field
#121751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring doc values sparse index enabling for the host.name field
#121751
Conversation
host.namehost.name field
| Property.Final | ||
| ); | ||
|
|
||
| public static final FeatureFlag DOC_VALUES_SPARSE_INDEX = new FeatureFlag("doc_values_sparse_index"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the feature flag here since we now use it to enable the setting or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the setting here, but move the setting registration to logsdb plugin?
| } | ||
| return indexed.isConfigured() == false | ||
| && hasDocValues.getValue() | ||
| return hasDocValues.getValue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I remove the check on index value and refactored the way we check sorting on host.name.
| assertTrue(mapper.fieldType().hasDocValues()); | ||
| assertFalse(mapper.fieldType().isIndexed()); | ||
| assertFalse(mapper.fieldType().hasDocValuesSparseIndex()); | ||
| assertTrue(mapper.fieldType().hasDocValuesSparseIndex()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has changed because the setting overrides the index parameter.
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
| public void testFieldTypeWithSkipDocValues_LogsDbMode() throws IOException { | ||
| assumeTrue("Needs feature flag to be enabled", FieldMapper.DOC_VALUES_SPARSE_INDEX.isEnabled()); | ||
|
|
||
| public void testFieldTypeWithSkipDocValues_LogsDbModeDisabledSetting() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now all these tests written in such a way that they work both for snapshot and release builds.
|
@elasticsearchmachine test this please. |
| && IndexMode.LOGSDB.equals(indexMode) | ||
| && HOST_NAME.equals(fullFieldName) | ||
| && (indexSortConfig != null && indexSortConfig.hasPrimarySortOnField(HOST_NAME)); | ||
| && indexSortConfigByHostName(indexSortConfig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I assume we would like to create the sparse index on host.name no matter if it is the first field or not in the index sort configuration.
| */ | ||
| public final class IndexScopedSettings extends AbstractScopedSettings { | ||
|
|
||
| public static final Set<Setting<?>> BUILT_IN_INDEX_SETTINGS = Set.of( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe register the setting in logsdb plugin? Then we don't have to make this big change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to move also tests in KeywordFieldMapperTests which are LogsDB-specific in LogsDB plugin...because getPlugins() needs the LogsDB plugin too which we can't have as a dependency in the existing KeywordFieldMapperTests. Not having the LogsDB plugin in the test results in failures because the setting is not registered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, can we avoid this big diff here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note that MapperTestCase does not register settings provided by other plugins (like LogsDB)...so I would need to implement the logic there to be able to register settings provided by other plugins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BUILT_IN_INDEX_SETTINGS is used by a few tests directly...which is why I had to do it this way. Those tests run both in snapshot and release builds, which means they fail il the setting is not registered because we register the setting conditionally depending on the feature flag. So if the setting is registered only in snapshot builds...release tests will fail with something like..."setting ... has not been registered".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify why I need the change. The original set BUILT_IN_INDEX_SETTINGS is defined using Set.of which returns an immutable set. As a result I cannot just conditionally add a new setting based on a feature flag. The add method call on an immutable set would result in throwing an UnsupportedOperationException as a result of trying to modify a collection that is immutable. So here what I do its...create a mutable set, conditionally add the setting and later on make a copy to have the static variable as an immutable set again.
Another option would be to change existing tests to conditionally use a different set based on a feature flag...I would not do that anyway.
| private final int ignoreAboveDefault; | ||
| private final IndexMode indexMode; | ||
| private final IndexSortConfig indexSortConfig; | ||
| private final boolean useDocValuesSParseIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename to: useDocValuesSkipper ?
|
|
||
| public static final FeatureFlag DOC_VALUES_SPARSE_INDEX = new FeatureFlag("doc_values_sparse_index"); | ||
| public static final Setting<Boolean> USE_DOC_VALUES_SPARSE_INDEX = Setting.boolSetting( | ||
| "index.mapping.use_doc_values_sparse_index", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the lucene name. So use_doc_values_skipper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that in IndexVersions I think I cannot rename public static final IndexVersion HOSTNAME_DOC_VALUES_SPARSE_INDEX = def(9_008_0_00, Version.LUCENE_10_0_0);
I will rename everything else.
| Property.Final | ||
| ); | ||
|
|
||
| public static final FeatureFlag DOC_VALUES_SPARSE_INDEX = new FeatureFlag("doc_values_sparse_index"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keep the setting here, but move the setting registration to logsdb plugin?
|
Can we avoid hardcoding |
The idea for now is the hardcode it. Enabling doc value skipper only makes sense for fields part of index sorting, so I don't think for now it makes sense to make it generic. Which fields / field types end up supporting this depends on research yet to be done. For tsdb maybe |
Also if we expose it as a mapping parameter users will be able to use
This has been discussed and we decided not to have it exposed as a mapping parameter to avoid users start defining sparse indices on fields of their choice. Also, putting the logic to expose the mapping parameter behind a feature flag:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, LGTM otherwise.
| } | ||
| } | ||
|
|
||
| public boolean hasSortOnFiled(final String fieldName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| public boolean hasSortOnFiled(final String fieldName) { | |
| public boolean hasSortOnField(final String fieldName) { |
| final KeywordFieldMapper mapper = (KeywordFieldMapper) mapperService.documentMapper().mappers().getMapper("host.name"); | ||
| assertTrue(mapper.fieldType().hasDocValues()); | ||
| assertFalse(IndexSettings.USE_DOC_VALUES_SKIPPER.get(settings) && mapper.fieldType().isIndexed()); | ||
| assertEquals(IndexSettings.USE_DOC_VALUES_SKIPPER.get(settings), mapper.fieldType().hasDocValuesSkipper()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assertEquals(IndexSettings.USE_DOC_VALUES_SKIPPER.get(settings), mapper.fieldType().hasDocValuesSkipper()); | |
| if (FieldMapper.DOC_VALUES_SPARSE_INDEX.isEnabled()) { | |
| assertTrue(mapper.fieldType().hasDocValuesSkipper()); | |
| } else { | |
| assertFalse(mapper.fieldType().hasDocValuesSkipper()); | |
| } |
I think this is easier to read?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wanted to have a different name for the setting and the feature flag...also to make it easier later to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the index setting is actually index.mapping.use_doc_values_skipper
| final KeywordFieldMapper mapper = (KeywordFieldMapper) mapperService.documentMapper().mappers().getMapper("host.name"); | ||
| assertTrue(mapper.fieldType().hasDocValues()); | ||
| assertFalse(IndexSettings.USE_DOC_VALUES_SKIPPER.get(settings) && mapper.fieldType().isIndexed()); | ||
| assertEquals(IndexSettings.USE_DOC_VALUES_SKIPPER.get(settings), mapper.fieldType().hasDocValuesSkipper()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assertEquals(IndexSettings.USE_DOC_VALUES_SKIPPER.get(settings), mapper.fieldType().hasDocValuesSkipper()); | |
| if (FieldMapper.DOC_VALUES_SPARSE_INDEX.isEnabled()) { | |
| assertTrue(mapper.fieldType().hasDocValuesSkipper()); | |
| } else { | |
| assertFalse(mapper.fieldType().hasDocValuesSkipper()); | |
| } |
and same for the other similar assertions?
This PR extends the work done in #121751 by enabling a sparse doc values index for the @timestamp field in LogsDB. Similar to the previous PR, the setting index.mapping.use_doc_values_skipper will override the index mapping parameter when all of the following conditions are met: * The index mode is LogsDB. * The field name is @timestamp. * Index sorting is configured on @timestamp (regardless of whether it is a primary sort field or not). * Doc values are enabled. This ensures that only one index structure is defined on the @timestamp field: * If the conditions above are met, the inverted index is replaced with a sparse doc values index. * This prevents both the inverted index and sparse doc values index from being enabled together, reducing unnecessary storage overhead. This change aligns with our goal of optimizing LogsDB for storage efficiency while possibly maintaining reasonable query latency performance. It will enable us to run benchmarks and evaluate the impact of sparse indexing on the @timestamp field as well.
In this PR, we change how the doc values sparse index is enabled for the
host.namekeyword field.The initial implementation of the sparse index for
host.namewas introduced in #120741.Previously, the choice between using an inverted index or a doc values sparse index was determined by the
indexparameter. With this change, we introduce a new final index-level setting,index.mapping.use_doc_values_sparse_index:true, we enable the sparse index and omit the inverted index forhost.name.false(default), we retain the inverted index instead.Additionally, this setting is only exposed if the
doc_values_sparse_indexfeature flag is enabled.This change simplifies enabling the doc values sparse index and makes the selection of indexing strategies explicit at the index level. Moreover, the setting is not dynamic and is exposed only for stateful deployments.
The plan is to enable this setting in our nightly benchmarks and evaluate its impact on LogsDB indexing throughput, storage footprint and query latency. Based on benchmarking results, we will decide whether to adopt the sparse index and determine the best way to configure it.